Functions Describing Survival Distribution
نویسندگان
چکیده
A common feature of survival data is the presence of censoring and non-normality. It is inappropriate to analyze survival data by the conventional statistical methods such as linear regression or logistic regression, because of the characteristics of the survival data. When censoring, survival time can’t be considered as a continuous variable. Linear regression, in which to compare mean time-to-event between groups, cannot deal with the influence from censored data correctly. Logistic method compares proportion of events between groups using odds ratio, but the differences in the timing of event occurrence are not considered. With unequal survival times, analyzing the probability of survival as a dichotomous variable by Chi-square test would fail to account for this non-comparability between subjects. This paper provides an overview of survival analysis and describes its principle and applications. Examples with SAS programming will illustrate the LIFEREG, LIFETEST, PHREG and QUANTLIFE procedures for survival analysis. The new developments including time-dependent covariates, recurrent events, quantile regression in identifying important prognostic factors for patient subpopulations and joint modeling of longitudinal such as quality of life and time-to-event data will also be discussed. What is Survival Analysis Survival analysis is used predominantly when the interest is in observing time to event. In biomedical sciences, the event of interest is often the time of death of an individual from the time of disease onset, diagnosis or time where a particular treatment (i.e. surgery, chemotherapy) was applied. Currently, the event has been a qualitative change at a particular time point. The event includes the time to a disease, time to disease-free status , time to onset of illness after exposure, time to recovery from illness, time to symptom relief, time to relapse (recurrence, progression), time to readmission, or transition above or below the clinical threshold of a meaningful continuous variable (i.e. CD4 counts, plate counts after marrow transplantation). In clinical trials, the starting reference time includes randomization, first treatment, onset of exposure, diagnosis or surgery. What Does Survival Analysis Do When study period is long enough to observe the survival time of all subjects, we may use more common methods such as t-test or regression analysis by considering survival time as a continuous variable. Survival analysis can 1. Estimate time-to-event for a group of individuals 2. Compare time-to-event between two or more groups 3. Assess the relationship of covariates to time-to-event. Functions Describing Survival Distributions 1. Cumulative Distribution Function (cdf) The cdf is defined as F(t) = P (T < t). It describes the probability that the random variable T (time of death) will be less than or equal to some time that we choose. F(t) is a non-decreasing function of t, and as t approaches ∞, F(t) approaches 1. F(t) has the probability values between 0 and 1. 2. Probability Density Function (pdf) The probability of the failure time occurring at exact time t (out of the whole range of possible t’s). t t t T t P t f t ∆ ∆ + < ≤ = → ∆ ) ( lim ) ( 0 Survival Analysis Approaches and New Developments using SAS, continued 2 The pdf is the derivative of the cdf, f(t) = d F (t) / dt. It is very useful in describing the continuous probability distribution of a random variable. The probability P(a < T < b) is the area under the curve between time a and time b. Figure 1. Survival Distribution Functions 3. Survival Function (sdf) In survival analysis, survival function is of the most interest, and it which is defined as S(t) = P(T > t). The survival function is the probability that the time of death is later than some specified time. S(t) is positive and in the range from 1 to 0. S(0) = 1 and as t approaches ∞, S(t) approaches 0. S(t) = P(T > t) = 1 F(t) 4. Hazard Function S(t) is the prevalence of not failing, while h(t) is the incidence of failure. The hazard rate is an un-observed variable. It is the fundamental dependent variable controling both the occurrence and the timing of the events. The hazard function is the conditional probability of failure in the next interval given survival to start of that interval. Models for survival analysis can be built from a hazard function, which measures the risk of failure of an individual at time. The hazard function h(t) is given by the following formula: The hazard function seems to be more intuitive to use in survival analysis than the pdf because it attempts to quantify the instantaneous risk that an event will take place at time t given that the subject survived to time t. The hazard function is always positive and when it is zero, it implies failure is impossible at that time. It can be constant over time (Exponential distribution), increasing or deceasing with time (Weibull distribution). Higher values of h(t) carry an increasing risk of failure. The survival function and the hazard function are closely connected. Larger values of h(t) will yield lower values of S(t) since S(t) measures the risk of not failing. The hazard is the basis for regression modeling. It is related to other functions. h(t) = P( t < T < (t + Δ) | T >t) = f(t) / (1 F(t)) = f(t) / S(t). Once we have modeled the hazard function, we can easily obtain the other functions. Nature of Survival Time Data Survival data measures lifetime or the length of time until the occurrence of an event. They are usually not normally distributed and also involve censoring. A censored observation is defined as an observation with incomplete information. Survival time is often censored. Survival time can be greater than a certain amount (right censored), less than a certain amount (left censored), or within a certain range (double censored). Of the three types of censoring methods, right censoring is the most common. Right censoring occurs because subjects are removed before failure or because failure occurs after the end of data collection. The purpose of survival analysis is to follow subjects over time and observe at which timepoint they experience the event of interest. Often times, the study is not long enough t t T t t T t P t h t ∆ ≥ ∆ + < ≤ = → ∆ ) / ( lim ) ( 0 Survival Analysis Approaches and New Developments using SAS, continued 3 for an event to occur for all subjects, due to a number of reasons. Subjects may drop out of the study for reasons unrelated to the study (i.e. patients moving to another area and leaving no forwarding address). If the subject were able to stay in the study then it would have been possible to observe the event eventually. The censored survival time is usually indicated by the following censoring variable. For example, the variable LIFETIME represents either a failure time or a censoring time. The variable CENSOR is equal to 0 if the value of LIFETIME is a failure time, and it is usually equal to 1 if the value is a censoring time. Another characteristic of survival data is that the response cannot be negative. This suggests that a transformation of the survival time such as a log transformation may be necessary or that special methods may be more appropriate than those that assume a normal distribution for the error term. It is especially important to check any underlying assumptions as part of the analysis because some of the models used are very sensitive to these assumptions. Considerations of Survival Analysis Methods Special considerations need to be taken when analyzing survival time data. Censoring and non-normality, cause great difficulty when trying to analyze survival data using traditional statistical models such as multiple linear regression. Data that have censored or truncated observations are said to be incomplete, and the analysis of such data requires special techniques. Data with censored observations cannot be analyzed by ignoring the censored observations because, among other considerations, the longer-lived individuals are generally more likely to be censored. Logistic regression analysis could be applied to quantify the importance of certain covariates in classifying individuals into groups, those that did or did not experience the event during the period of observation. However, this approach can result in considerable loss of information because differences in the timing of event occurrence are not considered. Alternatively, one could use linear regression analysis to identify covariates that influence survival times. The major drawback is that survival data are often censored, i.e., they contain observations for which one does not know when the event occurrs. With conventional statistical methodology, censored observations would either have to be deleted, or one would set their survival times to the total time period from onset to termination of the study. The proper method of survival data analysis is to take censoring into account and correctly use censored observations as well as uncensored observations. The likelihood-based parameter estimation methods used in survival analysis can effectively extract relevant information from both censored and uncensored observations, thereby producing reliable parameter estimates. Survival analysis is also the only method that can readily accommodate time-dependent covariates, i.e., independent variables whose values change during the course of the study. Disease severity is a time-dependent covariate, given that severity values change over time and they do so differently for each individual. There are basically three methods to analyze survival data, namely parametric, non-parametric and semi-parametric. Parametric methods assume the knowledge of the distributions of the survival times e.g. Exponential, Weibull, Normal, Log-logistic and Gamma. Non-parametric models make no assumptions of the distribution of the survival time e.g. the Kaplan and Meier estimators. Semi-parametric models assume a parametric form for the effects of the explanatory variables but make no assumptions of the distributions of the survival time. Survival Analysis Procedures There are three SAS procedures for analyzing survival data: LIFEREG, LIFETEST and PHREG. PROC LIFETEST is a nonparametric procedure for estimating the distribution of survival time, comparing survival curves from different groups, and testing the association of survival time with other variables. PROC LIFEREG and PROC PHREG are regression procedures for modeling the distribution of survival time with a set of concomitant variables. Survival Analysis Approaches and New Developments using SAS, continued 4 Table 1. SAS Procedures for Survival Analysis Applications PROC LIFEREG PROC LIFETEST PROC PHREG Assumption of underlying survival time distribution Must be specified (e.g., exponential, Weibull, gamma) Shape not specified Shape not specified
منابع مشابه
The force of mortality by life lived is the force of increment by life left in stationary populations
BACKGROUND The age distribution and remaining lifespan distribution are identical in stationary populations. The life table survival function is proportional to the age distribution in stationary populations. OBJECTIVE We provide an alternative interpretation of the life table when viewed by remaining years of life. CONCLUSIONS The functions describing the mortality of birth cohorts over ag...
متن کاملComparison of Some Nonlinear Functions for Describing Broiler Growth Curves of Cobb500 Strain
This study was conducted to compare some nonlinear functions to describe the broiler growth curve of the Cobb500 strain. A flock of fifty one-day-old chicks were randomly selected from a henhouse of 2500 chicks. Our goal was to establish a growth curve using weighting data using mathematical solutions of time-dependent differential functions. In total, six equations were subjected to a statisti...
متن کاملQuantifying the threshold frost hardiness for over-wintering survival of wheat in Iran, using simulation
The value of frost tolerance in wheat is increased with decreasing the temperature in late autumn and/or early winter (phase I, acclimation), then shows plateau state for a period with the coldest temperature (II), finally appears to decrease with warming the temperature (III, de-acclimation). This study was aimed to determine the threshold frost hardiness in wheat for avoiding ...
متن کاملBeta-Linear Failure Rate Distribution and its Applications
We introduce in this paper a new four-parameter generalized version of the linear failure rate distribution which is called Beta-linear failure rate distribution. The new distribution is quite flexible and can be used effectively in modeling survival data and reliability problems. It can have a constant, decreasing, increasing and bathtub-shaped failure rate function depending on its parameter...
متن کاملRepresentation and Problem Solving with the Distribution Envelope Determination (DEnv) Method
Daniel Berleant and Jianzhong Zhang Department of Electrical and Computer Engineering 2215 Coover Hall Iowa State University Ames, Iowa 50011 [email protected] Abstract Distribution Envelope Determination (DEnv) is a technique for computing descriptions of derived random variables. Derived random variables have samples that are a function of samples of other random variable(s), which are ter...
متن کاملSurvival analysis in infectious disease research: describing events in time.
Survival analysis methods can be used in infectious disease research to describe the occurrence and timing of clinical or other events subject to censoring and truncation. Here, the survival, hazard, and cumulative hazard functions are defined and simple nonparametric estimators are provided using an illustrative example of survival after AIDS diagnosis. An understanding of these foundational m...
متن کامل